--- layout: page title: Lab 4 - Selection on Observables permalink: /labs/lab4/ parent: Labs nav_order: 4 ---
Sometimes we cannot randomise treatment. While randomising treatment is the clearly preferable approach - as we learned in the lectures - we still might want to estimate a treatment effect for some interventions or causes we cannot influence. In fact, this might be the case for some of the most interesting research questions in social science.
Simply using observational data comes with all kinds of problems - remember the fundamental problem of causal inference. To mitigate these, we need to assume that, conditional on observable covariates, potential outcomes are the same for treated and untreated units. That is, we must check if treated and untreated units are different in ways we can observe.
Two assumptions are crucial to such designs based on selection on observables:
\[ ((Y_i(1), Y_i(0)) \perp D_i, X_i\]
This assumptions maintains that potential outcomes are independent of treatment (\(D_i\)), conditional on covariates (\(X_i\)). That is, for the same outcome of the covariates (or the same \(X_i\)), treatment is as good as random.
\[ 0 < Pr(D_i=1| X_i=x) < 1\]
This assumptions describes that the probability of treatment is neither 0 nor 1 for a given \(X\). There must be variation in the observations we compare to each other: For each level of \(X_i\), there are both treated and untreated units.
Note that these assumptions are not the same as the ones we saw under random assignment. Under random assignment, the assumption is:
\[ (Y_i(1), Y_i(0)) \perp D_i\] which implies that \(E[Y_{0i} | D_i = 1] = E[Y_{0i} | D_i = 0]\) and \(E[Y_{1i} | D_i = 1] = E[Y_{1i} | D_i = 0]\). In other words, the potential outcomes of all units are the same, regardless of treatment status.
Now, once we select on observables, the conditional
independence assumption implies that \(E[Y_{0i} | D_i = 1, X_i = x] = E[Y_{0i} | D_i = 0,
X_i = x]\) and \(E[Y_{1i} | D_i = 1,
X_i = x] = E[Y_{1i} | D_i = 0, X_i = x]\). In other words, the
potential outcomes of all units with the same \(X_i\) are the same, regardless of treatment
status.
Accordingly, we can derive our estimator as:
\[\Gamma^{SOO} = E[Y_i(1)| X_i = x] - E[Y_i(0)| X_i = x]\]
Taking the weighted average across \(X_i\), we can - in principle - calculate the ATE. However, doing this in practice inevitably involves some degree of uncertainty and imprecision. The conditional independence assumption is a very strong assumption - it hardly can be satisfied in practice. First, we cannot account for unobserved variables. As soon as we are working with continuous variables, we are likely to introduce bias as we need to make trade-offs regarding the similarity of observations. One major - more general - obstacle is the curse of dimensionality. While we can relatively easily satisfy the condition for one variable (especially if it is binary), this becomes exponentially more difficult if we increase the number of variables: We would need a meaningful number of observation for each combination of outcomes.
Selection on observables should therefore be done with caution. Some treat it as a last-resort approach - using it if and only if randomisation and other identification strategies are not available. However, there might well be cases in which it can be the most credible design to test hypotheses.
There are two approaches to selecting on observables:
This one should be familiar to you. Running a multivariate regression means that the estimation of a treatment effect and conditioning on observables happen in a single step.
This is a parametric technique, which often involves extrapolation. That is, outcomes of \(X\) that are being compared are not necessarily ‘close’ or similar to each other.
As opposed to regressions, balancing coviariates and the estimation
of a treatment effect are separated when we apply Matching
techniques. This allows us to influence how balance is being
achieved. In other words, we can make adjustments to the process of
making units more comparable. We can, for instance, introduce much more
narrow restrictions and only compare observations that are
identical with respect to certain covariates. Think of binary
variables and combinations thereof. Matching is a non-parametric
estimation technique, which means that assumptions about the functional
form we are implicitly making when running regressions do not need to be
satisfied here.
There are different ways to match observations, such as using the most similar observation (‘nearest neighbour’), defining bandwidths or propensity scores. We will use a few of these approaches in this lab.
Before starting this seminar
Create a folder called “lab4”
Download the data (you can use this button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab4” folder.
Set your working directory using the setwd() function or by clicking on “More”. For example setwd(“~/Desktop/Causal Inference/2024/Lab4”)
Let’s install an load packages that we will be using in this lab:
library(jtools) # generate plots with regression coefficients
library(stargazer) # generate formated regression tables
library(texreg) # generate formatted regression tables
library(nnet) # If you want to use glm() to check for balance
library(tidyverse) # to conduct some tidy operations
library(ggplot2) # to generate plots
library(Matching) #provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm.
require(cobalt) # Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification
In this seminar, we will cover the following
topics:
1. See that failure to condition on relevant covariates can result in
heavily biased (even outrigth wrong) estimates.
2. Run various multivariate regressions, conditioning on observable
covariates.
3. Test and plot visual evidence for the common support
assumption.
4. Learn how to match observations.
5. Conduct pre and post-balance checks using the Matching
package.
6. Apply propensity score matching, both without and with
replacement.
Today, we will be working with data from Ebonya Washington’s work on
Female
Socialization: How Daughters Affect Their Legislator Fathers’ Voting on
Women’s Issue. In this paper, Washington seeks to answer a
behavioural question: Does having a daughter alter the political
attitudes of members of Congress? To address this question, she
develops an identification strategy to reveal the causal effect of
having a daughter on liberal voting with regard to issues such as
reproductive rights.
Personal circumstances are well known to affect legislators opinions
and even voting behaviour. Relatively famous is the case of Ronald
Reagan: After being diagnosed with Alzheimer’s his inner circle started
advocating in favour of stemm cell research in the hope of finding a
cure, although Reagan staunchly rejected any use of stemm cells for
ideological reasons throughout his political career.
This paper looks at the behavioural effect children exert on their
parents’ political attitudes. Washington specifically looks at the shift
in attitudes towards women’s issues that are associated with raising a
daughter. She finds clear evidence that parenting a daughter results in
a more liberal voting record of members of Congress with regard to
several issues, but particularly so for reproductive rights.
Importantly, Washington finds that - conditional on the total number
of children - the effect is strong and significant. Obviously, members
of Congress differ in several other characteristics as well, not last in
terms of political predisposition. She therefore takes into account
further personal as well as political characteristics of
legislators.
The identification strategy of this paper rests on a few important assumptions:
Washington concludes that she finds strong support for the
behavioural effect hypothesis, which indeed is most pronounced for
reproductive rights.
Her data set includes the voting records of all the Representatives in the US Congress during the 105th Congress, comprising the years 1997–1998. A description of the variables is listed below:
| Variable | Description |
|---|---|
nowtot |
Congressional voting record scores as compiled by the National Organisation of Women (NOW). A higher level in this variable indicates a more liberal record (the outcome). |
ngirls |
The number of daughters a legislator has (the treatment) |
totchi |
The total number of children of a legislator |
repub |
Binary variable for membership in the Republican party |
female |
Binary variable equal to 1 if legislator is female |
white |
Binary variable equal to 1 if legislator is white |
reld3 |
Binary variable equal to 1 if legislator is catholic |
age |
Continuous variable indicating legislators’ age |
srvlng |
Service length in Congress (in years) |
demvote |
Democratic vote share in legislator’s district |
propgirls |
Proportion of legislator’s children that are female |
Now let’s load the data. There are two ways to do this:
You can load the brands dataset from your laptop using the
load() function. If you stored your data in the same
directory that set your working directory, you only need to write the
name of the dataset, plus the extension (.rdata).
# Set your working directory
#setwd("~/Desktop/Causal Inference/2024/Lab4")
#
#load("daughters.rdata") # loading your data from your laptop
# Or you can download the data from the course website from following url:
load(url("https://github.com/bayreuth-politics/CI24/raw/main/docs/data/daughters.rdata"))
Let’s start this lab by inspecting the data.
head(daughters)
# A tibble: 6 × 14
ngirls totchi repub srvlng female white age demvote aauw rtl nowtot nboys
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 7 0 1 58 0.570 100 90 90 0
2 1 3 0 15 0 1 54 0.600 88 95 90 2
3 0 0 1 1 0 1 31 0.430 0 0 0 0
4 2 2 0 1 0 1 51 0.520 100 95 95 0
5 2 2 0 7 0 1 39 0.590 100 94 90 0
6 3 7 1 27 0 1 68 0.280 0 0 0 4
# … with 2 more variables: reld3 <dbl>, propgirls <dbl>
That looks as expected. Let us start with a few descriptive statistics to become familiar with the data.
Exercise 1: How many children does the average US representative have?
Reveal Answer
mean(daughters$totchi, na.rm = TRUE)
[1] 2.493088
# or use the summary() command
summary(daughters$totchi)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 2.000 2.000 2.493 3.000 10.000 1
Simply using the mean function, we can calculate the
average number of children per observation, i.e. Representative. As we
see, during that term a legislator had on average 2.49
children.
Exercise 2: How many daughters does the average US representative have?
Reveal Answer
average_daughters <- mean(daughters$ngirls, na.rm = TRUE)
print(average_daughters)
[1] 1.274194
# or use the summary() command
summary(daughters$ngirls)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 0.000 1.000 1.274 2.000 7.000 1
Following the same logic as above, we can calculate the average
number of daughters - feel free to store the number in an object. A
legislator had on average 1.27 daughters.
Exercise 3: What is the proportion of daughters among US representatives? What would you expect?
Reveal Answer
share_daughters <- sum(daughters$ngirls , na.rm = TRUE)/sum(daughters$totchi, na.rm = TRUE)
print(share_daughters)
[1] 0.5110906
# or use the summary() command
summary(daughters$propgirls)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.3333 0.5000 0.5200 0.6667 1.0000 61
We can calculate the proportion of daughters by calculating both the
total number of daughters and the total number of children using the
sum command. Alternatively, we can calculate the average
proportion of daughters among legislators. Note that - while they should
roughly match - these are not the same parameters. We see that
51% of all children were female, while the average
proportion of daughters is 52% - this is roughly what we’d
expect when gender is random.
Exercise 4: What is the NOW score for the average Republican
member? And Democrat?
Hint: Make sure to exclude NAs before calculating the average NOW scores!
Reveal Answer
# Republicans
now_rep <- mean(daughters$nowtot[daughters$repub=="1"], na.rm = TRUE)
now_rep
[1] 12.04846
# Democrats
now_demo <- mean(daughters$nowtot[daughters$repub=="0"], na.rm = TRUE)
now_demo
[1] 73.87255
The average NOW score for Republicans is 12,
while the average score for Democrats is 74. This is a
massive difference, which we would expect to see in Congress, especially
when it comes to rather polarised issues.
Exercise 5: Run a multivariate regression of the NOW score on
all the legislator characteristics that you think contribute to explain
voting records, excluding any variable on the number or gender of
children
Reveal Answer
# One possible regression could look like this:
ols_covariates <- lm(nowtot ~ repub + female + age + reld3 + white + demvote + srvlng, data=daughters)
summary(ols_covariates)
Call:
lm(formula = nowtot ~ repub + female + age + reld3 + white +
demvote + srvlng, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-51.640 -9.889 -0.333 10.001 59.447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.95838 9.39890 0.634 0.52646
repub -47.14717 2.10973 -22.348 < 2e-16 ***
female 12.57095 2.74017 4.588 5.91e-06 ***
age 0.02396 0.11021 0.217 0.82797
reld3 -4.33507 1.82870 -2.371 0.01821 *
white 8.73260 3.17143 2.754 0.00615 **
demvote 102.63605 10.10667 10.155 < 2e-16 ***
srvlng -0.02605 0.13663 -0.191 0.84888
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.87 on 423 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.7902, Adjusted R-squared: 0.7868
F-statistic: 227.6 on 7 and 423 DF, p-value: < 2.2e-16
We can see clear differences across legislator characteristics and
partisan variables. From the regression we can see that party
affiliation, gender, race, and religion, and the preferences of voters
in the legislator’s district are strong determinants of legislator’s
voting record. Republicans (repub) are much less likely to
vote for liberal policies in Congress. Similarly, higher democratic vote
shares (demvote) on a continuous scale are associated with
more liberal voting of legislators. Female representatives
tend to have more liberal voting records as do white
representatives. Being catholic (reld3) is associated with
more conservative voting records - which we would expect to be the
case.
Let us now assess the randomness of children’s gender. This
assumption is crucial to the research design’s identification
strategy.
In countries like China and India, there is a strong parental preference for boys. Access to prenatal scanning techniques and selective abortion practices have strongly skewed the gender mix towards males. The likelihood of giving birth to a boy rather than a girl is further strongly correlated with parental characteristics. For instance, it is higher among the Indian middle classes than lower down the socio-economic scale. It is also higher in Northern Indian states than in more southerly states. This implies a causal relation from parental characteristics (including, probably, ideological preferences) to the gender of the children. It is important to understand whether US legislators display similar children gender-preferences. Further, are these preferences potentially correlated with the legislator’s characteristics (such as party affiliation, race etc)?
Exercise 6: Test the hypothesis that the gender mix among
children of US legislators is 50%. Can we reject the null
hypothesis?
Hint: Use the t-test command and change the
alternative hypothesis you want to test by adding the argument
mu = 0.5.
Reveal Answer
t.test(daughters$propgirls,mu = 0.5)
One Sample t-test
data: daughters$propgirls
t = 1.217, df = 373, p-value = 0.2244
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.4876701 0.5523775
sample estimates:
mean of x
0.5200238
The mu argument alters the alternative (‘null’)
hypothesis we are testing. If we do not specify it, the default null
hypothesis will be a difference from a value of 0. When we
specify mu = 0.5, the t-test shows whether we can reject
the null hypothesis of a mean value of 0.5. The t-test
fails to reject the null hypothesis at the 10% level as the p-value is
0.22.
Exercise 7: Can we say that the number of daughters is a
randomly allocated variable for legislators with two children? And for
those with one child? How about across partisan lines?
Reveal Answer
# Two-children only
t.test(daughters$propgirls[daughters$totchi==2],mu = 0.5)
One Sample t-test
data: daughters$propgirls[daughters$totchi == 2]
t = 0.38936, df = 137, p-value = 0.6976
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.4556672 0.5660719
sample estimates:
mean of x
0.5108696
# One-child only
t.test(daughters$propgirls[daughters$totchi==1],mu = 0.5)
One Sample t-test
data: daughters$propgirls[daughters$totchi == 1]
t = 1.6125, df = 39, p-value = 0.1149
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.4681975 0.7818025
sample estimates:
mean of x
0.625
# Reps v Democrats
t.test(daughters$propgirls[daughters$repub==0],
daughters$propgirls[daughters$repub==1])
Welch Two Sample t-test
data: daughters$propgirls[daughters$repub == 0] and daughters$propgirls[daughters$repub == 1]
t = 1.2576, df = 348.43, p-value = 0.2094
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.02355635 0.10710714
sample estimates:
mean of x mean of y
0.5421402 0.5003648
Using the same syntax as before, you can just subset the data to
consider legislators with exactly two children
([daughters$totchi==2]). The mean is now even slightly
closer to 50% and we again fail to reject the null hypothesis of a mean
of 0.5. We cannot reject the null hypothesis for legislators with one
child only either. To compare Democrats and Republicans, we use a
two-sample t-test as we have done before in previous classes. With a
p-value of 0.2094, we maintain the null hypothesis of no
difference between Republicans and Democrats.
Remember: The finding that gender is indeed random is crucial to the paper’s identification strategy.
Before we run any regressions, let us make sure we include observations with missing values. This way, we can make fair comparisons:
daughters <- daughters %>%
drop_na()
Exercise 8: Run a bivariate regression of NOW score on the
number of daughters.
Are you surprised by your findings? In this regression, do you think that the independent variable of interest is uncorrelated with the error term?
Reveal Answer
### To run a regression of NOW score on the number of daughters:
lm_nowtot_bivariate <- lm(nowtot ~ ngirls, data=daughters)
summary(lm_nowtot_bivariate)
Call:
lm(formula = nowtot ~ ngirls, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-43.250 -34.744 -9.744 40.256 60.256
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.250 3.232 13.383 <2e-16 ***
ngirls -1.753 1.765 -0.993 0.321
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 36.79 on 369 degrees of freedom
Multiple R-squared: 0.002667, Adjusted R-squared: -3.53e-05
F-statistic: 0.9869 on 1 and 369 DF, p-value: 0.3211
The point estimate of the relation is negative, which is surprising. However, we cannot say that the independent variable in this regression is uncorrelated with the error term. The number of daughters is clearly positively correlated with the number of children, which in turn is positively correlated with ideological preferences. For instance, very religious and conservative people tend to have larger families and therefore more daughters.
Exercise 9: Run a multivariate regression of NOW score on the
number of daughters controlling for the number of
children.
Are you surprised by your findings? In this regression, do you think that the independent variable of interest is uncorrelated with the error term?
Reveal Answer
# To run a regression of NOW score on the number of daughters, controlling for the number of children:
lm_nowtot_multivariate <- lm(nowtot ~ ngirls + totchi, data=daughters)
summary(lm_nowtot_multivariate)
Call:
lm(formula = nowtot ~ ngirls + totchi, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-54.20 -32.20 -10.32 35.86 88.67
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.321 4.227 13.560 < 2e-16 ***
ngirls 5.378 2.235 2.406 0.0166 *
totchi -8.499 1.714 -4.958 1.09e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 35.67 on 368 degrees of freedom
Multiple R-squared: 0.06513, Adjusted R-squared: 0.06005
F-statistic: 12.82 on 2 and 368 DF, p-value: 4.155e-06
The point estimate is now strongly positive, which is consistent with
our predictions. We have seen that the gender mix among US legislators
is random. Therefore, we can conclude that the number of daughters
(after we have controlled for the number of children) is uncorrelated
with the error term of this regression.
Note that conditioning on the number of children - or ‘controlling’ for it - did not only alter the magnitude of the estimated effect of the number of daughters but also its direction. If we had (wrongly) disregarded the total number of children, we’d have a completely false impression of the true effect size and direction.
Exercise 10: Run a multivariate regression of NOW score on
all relevant variables.
Run the regression and compile a table showing all three regression
models using the stargazer, screenreg or
multisum package.
Reveal Answer
# To run a regression of NOW score on the number of daughters, controlling for the number of children:
lm_nowtot_all <- lm(nowtot ~ ngirls + totchi + age + repub + female + white + reld3 + demvote + srvlng, data=daughters)
summary(lm_nowtot_all)
Call:
lm(formula = nowtot ~ ngirls + totchi + age + repub + female +
white + reld3 + demvote + srvlng, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-52.137 -9.278 -0.407 8.814 61.286
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.77303 10.09259 0.770 0.44170
ngirls 2.69925 1.04319 2.587 0.01006 *
totchi -2.47275 0.84987 -2.910 0.00384 **
age 0.04931 0.12407 0.397 0.69127
repub -46.69068 2.24967 -20.754 < 2e-16 ***
female 12.73937 2.81649 4.523 8.28e-06 ***
white 8.32240 3.31740 2.509 0.01255 *
reld3 -4.77126 1.97183 -2.420 0.01603 *
demvote 100.71891 10.58918 9.511 < 2e-16 ***
srvlng 0.04046 0.14044 0.288 0.77342
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.46 on 361 degrees of freedom
Multiple R-squared: 0.8047, Adjusted R-squared: 0.7998
F-statistic: 165.3 on 9 and 361 DF, p-value: < 2.2e-16
# To produce a table
library(stargazer)
stargazer(lm_nowtot_bivariate, lm_nowtot_multivariate, lm_nowtot_all, title = "Models", column.labels=c("Bivariate","Multivariate", "Complete"), align = TRUE, type = "text")
Models
========================================================================================
Dependent variable:
--------------------------------------------------------------------
nowtot
Bivariate Multivariate Complete
(1) (2) (3)
----------------------------------------------------------------------------------------
ngirls -1.753 5.378** 2.699**
(1.765) (2.235) (1.043)
totchi -8.499*** -2.473***
(1.714) (0.850)
age 0.049
(0.124)
repub -46.691***
(2.250)
female 12.739***
(2.816)
white 8.322**
(3.317)
reld3 -4.771**
(1.972)
demvote 100.719***
(10.589)
srvlng 0.040
(0.140)
Constant 43.250*** 57.321*** 7.773
(3.232) (4.227) (10.093)
----------------------------------------------------------------------------------------
Observations 371 371 371
R2 0.003 0.065 0.805
Adjusted R2 -0.00004 0.060 0.800
Residual Std. Error 36.789 (df = 369) 35.667 (df = 368) 16.459 (df = 361)
F Statistic 0.987 (df = 1; 369) 12.818*** (df = 2; 368) 165.272*** (df = 9; 361)
========================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
We can see that even after controlling for other characteristics an
increase in the number of daughters is still associated with a
meaningfully and significantly more liberal voting record. By
conditioning on the total number of children we have exogenised
ngirls, which suffered from endogeneity in the bivariate
model.
Generally, when applying Matching techniques you should follow these steps:
Run probit (or logit) with the treatment as outcome and
(relevant) covariates as predictors
Predict probabilities – this will create the
propensity score. Note here that some (e.g. Sekhon) suggest
that the logit score is superior to the propensity score.
Match each participant to one or more non-participants
using the propensity score. You can do this in a variety of ways:
Nearest Neighbour Matching (NN), Caliper Matching, Mahalanobis
Distance, Exact Matching etc. The default of the
Matching package we use today is NN.
Check balance after matching
Estimate the ATT
You can also use Matching for continuous treatments.
However, to simplify the example in this lab we will turn the continuous
treatment variable (ngirls) into a binary treatment
variable that distinguishes legislators with at least one
daughter from those that do not have any daughters at all. To do so, we
create the binary variable treat:
# Creating a binary treatment variable.
daughters <- daughters %>%
mutate(treat = ifelse(ngirls > 0, 1, 0))
table(daughters$treat)
0 1
59 312
Let us start with a pre-matching balance check. Note that this is essentially the same as randomization checks but in the context of Matching we call this process “Balance Check”. Since the treatment is now binary, we are going to run a logistic regression (or probit or less conservatively OLS).
Exercise 11: Run a logistic regression with the binary
treatment variable (treat) as outcome.
Make sure to include all covariates from the complete multivariate regression model.
Reveal Answer
pre_matching <- glm(treat ~ totchi + age + repub + female + white + reld3 + demvote + srvlng,
data = daughters, family = binomial("logit"))
summary(pre_matching)
Call:
glm(formula = treat ~ totchi + age + repub + female + white +
reld3 + demvote + srvlng, family = binomial("logit"), data = daughters)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6325 0.2479 0.4585 0.6491 1.2078
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.10354 1.72165 -0.641 0.5215
totchi 0.76629 0.16640 4.605 4.12e-06 ***
age 0.01556 0.02248 0.692 0.4888
repub -0.11453 0.38568 -0.297 0.7665
female 0.37737 0.54813 0.688 0.4912
white 0.28492 0.55719 0.511 0.6091
reld3 -0.03414 0.34187 -0.100 0.9204
demvote 0.72026 1.82422 0.395 0.6930
srvlng -0.05790 0.02464 -2.350 0.0188 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 325.04 on 370 degrees of freedom
Residual deviance: 288.25 on 362 degrees of freedom
AIC: 306.25
Number of Fisher Scoring iterations: 5
As expected the total number of children is strongly associated with
having at least one daughter. Interestingly, there is a somewhat weak
correlation between previous time served in Congress and having a
daughter. Apart from this, however, all other covariates are not
significant predictors of the treat variable.
Now, we are going to create a data frame by predicting the outcome
from the regression we just ran. The first column will be the
propensity score, i.e. the predicted values or the
probability of receiving the treatment given the covariates; and the
second will be a binary indicator for those treated
(treat==1). We use the Matching package to do
this.
library(Matching)
ps.df <- data.frame(pr_score = predict(pre_matching, type = "response"),
treat = pre_matching$model$treat)
head(ps.df, n=5)
pr_score treat
1 0.8681191 1
2 0.8610881 1
3 0.7925949 1
4 0.9835946 1
5 0.9620695 1
As you can see, the data frame shows the calculated propensity
scores for each observation as well as the treatment status,
i.e. whether legislators had at least one daughter.
The Common Support Assumption requires that for each value of \(X\), we can compare treated and untreated units. Using the propensity score, we have reduced the dimensionality of our covariates. That is, instead of having to pay attention to individual covariates, we look at the probability of being assigned to treatment. This makes it easier for us to check the assumption. We can do this by plotting the propensity score by treatment status.
Exercise 12: Plot the distribution of the propensity score by
treatment status.
Hint: You can use the the sm package’s
sm.density.compare command to the do so. The first argument
is the propensity score, while the second indicates treatment
status.
Reveal Answer
require(sm)
sm.density.compare(ps.df$pr_score, ps.df$treat, xlab="Propensity Score")
title(main="Propensity score by treatment status")
text(0.3,4, "Control", col="red")
text(0.3,3.5, "Treated", col="green")
We can see that treated observations are associated with a larger
propensity score than untreated units. This is what we would expect.
Moreover, there is a significant overlap: For a large range of
propensity scores there are both treated and untreated units.
We are going to use Matching, a package with two
important functions:
MatchBalance() function gives you balance before
and after matchingMatch() performs propensity score and distance
(i.e. mahalanobis) MatchingBefore we start our analysis, let’s check balance before
matching with this package. Let’s use the
MatchBalance() to do so. The syntax is the following:
MatchBalance(treat assignment variable ~ covariates, data= dataset).
We will later include the match.out argument into this
function to compare balance before and after matching.
bal.bm=MatchBalance(treat~ repub + srvlng + female + white + age +
demvote + totchi + reld3 ,data=daughters)
What does this tell us?
We can see from this output at the end what are the variables that
there is unbalance. For example, we see that there is imbalance with
regard to the totchi covariate.
Now, let’s match! We use propensity score matching (PSM). By default, we estimate NN – what does this mean? NN sets \(C(\bar{P}_i)=min|\bar{P_i}-\bar{P}_j|\). Basically, the non-participant(s) \(\bar{P}_j\) closest to our participant \(\bar{P}_i\) will be selected as a match. The estimator has the option to perform matching with or without replacement. We will first conduct matching without replacement. It’s important to know though that this method has the disadvantage that the final estimate will usually depend on the initial ordering of the treated observations for which the matches were selected.
Let’s start with Step 1 of our list: Run a logit or
probit with the treatment as outcome. To retrieve the propensity score,
we estimate a logit model predicting assignment to treatment – just what
we did above in pre_matching. We’ll call this
ps though it’s exactly the same model as
pre_matching - so feel free to just use the latter.
Exercise 13: regress treatment variable treat on
totchi, repub, srvlng,
female, white, age,
demvote, and reld3. using the
glm() function. The syntax is the following:
glm(treatment assignment variable ~ covariate 1 + covariate 2...covariate n, data = data, family = binomial("logit")).
Store the output from the glm() function into an object
called ps.
Reveal Answer
ps <- glm(treat ~ repub + srvlng + female + white +
age + demvote + reld3 + totchi , data = daughters, family = binomial("logit"))
Next, we are going to match each treatment observation with a
untreated unit, without replacement (that means, we are only going to
use unique pairs of treatment-control, by contrast to matching with
replacement, where the same individual can be control for several
treated units). Let’s use the Match() function!
The syntax is the following:
Match(Y = df$outcome, Tr = df$treat, X = ps$fitted, replace = FALSE).
When we set the replace argument equal to
FALSE we are setting our matching algorithm without
replacement. The second argument of this function is treatment status
(Tr) has to be equal to the treatment assignment variable
daughters$treat. The X is the argument that
determines the vector that will use to match the units. When we are
conducting PSM, we set X equal to the propensity score obtained from the
predicted values from the logistic regression ps$fitted
Exercise 14: Conduct matching using Match()
without replacement. Match using propensity scores. Store the output of
the Match() function into an object. Then, use
summary() function to report the results. We need to set
seed to replicate our results. Set a seed equal to 1111
set.seed(1111). This allows you to replicate results
whenever random numbers are involved.
Reveal Answer
set.seed(1111)
match.ps<- Match(Y=daughters$nowtot, Tr=daughters$treat, X=ps$fitted, replace=FALSE)
summary(match.ps)
Estimate... -0.16949
SE......... 6.5292
T-stat..... -0.025959
p.val...... 0.97929
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 59
Matched number of observations (unweighted). 59
We got that our point estimate for our ATT is -0.17 -
which is not statistically significant. Does this seem good?
Let’s compare this point estimate to the regression estimates from
OLS.
Exercise 15: Using the lm() function, regress
nowtot on the indepent variables treat,
totchi, repub, srvlng,
female, white, age,
demvote, and reld3.Store the output from
lm() into an object. Use the summary function
summary() to report the results. What is the estimate for
the treat variable?
Reveal Answer
multivariate <- lm(nowtot ~ treat + totchi + repub + srvlng + female + white + age + demvote + reld3, data=daughters)
summary(multivariate)
Call:
lm(formula = nowtot ~ treat + totchi + repub + srvlng + female +
white + age + demvote + reld3, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-52.052 -9.092 0.325 8.907 59.732
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.00863 10.22942 0.587 0.5573
treat 4.09385 2.45198 1.670 0.0959 .
totchi -1.35926 0.68149 -1.995 0.0468 *
repub -46.88293 2.25988 -20.746 < 2e-16 ***
srvlng 0.03889 0.14183 0.274 0.7841
female 12.84109 2.83150 4.535 7.85e-06 ***
white 7.94178 3.33734 2.380 0.0178 *
age 0.04104 0.12479 0.329 0.7424
demvote 100.64213 10.64798 9.452 < 2e-16 ***
reld3 -4.79761 1.98234 -2.420 0.0160 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.55 on 361 degrees of freedom
Multiple R-squared: 0.8026, Adjusted R-squared: 0.7977
F-statistic: 163.1 on 9 and 361 DF, p-value: < 2.2e-16
We see that the point estimates substantially differ. From matching
using nearest neighbour without replacement we got -0.17,
whereas from OLS we got 4.09, being significant at the 90%
level.
Now that we’ve done the matching, let’s check balance anew. For that,
let’s use the MatchBalance() function. Recall the syntax is
the following:
MatchBalance(treat assignment variable ~ covariates, data= dataset),
but now let’s include the match.out argument into the
function, so we can check balance before and after matching,
i.e. MatchBalance(treat assignment variable \~ covariates, match.out = output Match, data= dataset).
Exercise: 16: Check for balance using the
MatchBalance() for the Nearest Neighbour matching without
replacement.
Reveal Answer
bal.ps=MatchBalance(treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
match.out=match.ps,data=daughters)
***** (V1) repub *****
Before Matching After Matching
mean treatment........ 0.53846 0.54237
mean control.......... 0.49153 0.49153
std mean diff......... 9.4 10.119
mean raw eQQ diff..... 0.050847 0.050847
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.023468 0.025424
med eCDF diff........ 0.023468 0.025424
max eCDF diff........ 0.046936 0.050847
var ratio (Tr/Co)..... 0.98066 0.9931
T-test p-value........ 0.51323 0.54963
***** (V2) srvlng *****
Before Matching After Matching
mean treatment........ 8.5865 7.5085
mean control.......... 10.78 10.78
std mean diff......... -29.042 -50.866
mean raw eQQ diff..... 2.1525 3.2712
med raw eQQ diff..... 2 2
max raw eQQ diff..... 8 18
mean eCDF diff........ 0.061969 0.088512
med eCDF diff........ 0.060354 0.084746
max eCDF diff........ 0.13521 0.23729
var ratio (Tr/Co)..... 0.54418 0.39465
T-test p-value........ 0.12162 0.027254
KS Bootstrap p-value.. 0.158 0.036
KS Naive p-value...... 0.32452 0.072155
KS Statistic.......... 0.13521 0.23729
***** (V3) female *****
Before Matching After Matching
mean treatment........ 0.125 0.067797
mean control.......... 0.084746 0.084746
std mean diff......... 12.152 -6.6846
mean raw eQQ diff..... 0.050847 0.016949
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.020127 0.0084746
med eCDF diff........ 0.020127 0.0084746
max eCDF diff........ 0.040254 0.016949
var ratio (Tr/Co)..... 1.3907 0.81481
T-test p-value........ 0.32993 0.73985
***** (V4) white *****
Before Matching After Matching
mean treatment........ 0.87179 0.88136
mean control.......... 0.84746 0.84746
std mean diff......... 7.268 10.394
mean raw eQQ diff..... 0.016949 0.033898
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.012169 0.016949
med eCDF diff........ 0.012169 0.016949
max eCDF diff........ 0.024337 0.033898
var ratio (Tr/Co)..... 0.85267 0.80889
T-test p-value........ 0.63372 0.56484
***** (V5) age *****
Before Matching After Matching
mean treatment........ 52.628 52.576
mean control.......... 52.203 52.203
std mean diff......... 4.7262 4.3324
mean raw eQQ diff..... 1.4915 2.1695
med raw eQQ diff..... 1 2
max raw eQQ diff..... 7 17
mean eCDF diff........ 0.033654 0.044329
med eCDF diff........ 0.029444 0.033898
max eCDF diff........ 0.087082 0.11864
var ratio (Tr/Co)..... 0.67918 0.62271
T-test p-value........ 0.77901 0.84838
KS Bootstrap p-value.. 0.694 0.694
KS Naive p-value...... 0.84606 0.80063
KS Statistic.......... 0.087082 0.11864
***** (V6) demvote *****
Before Matching After Matching
mean treatment........ 0.49929 0.46695
mean control.......... 0.51017 0.51017
std mean diff......... -8.5332 -36.717
mean raw eQQ diff..... 0.016271 0.04322
med raw eQQ diff..... 0.01 0.03
max raw eQQ diff..... 0.08 0.16
mean eCDF diff........ 0.02367 0.068927
med eCDF diff........ 0.018307 0.067797
max eCDF diff........ 0.071056 0.16949
var ratio (Tr/Co)..... 0.9728 0.82996
T-test p-value........ 0.55409 0.056839
KS Bootstrap p-value.. 0.878 0.266
KS Naive p-value...... 0.96361 0.36496
KS Statistic.......... 0.071056 0.16949
***** (V7) totchi *****
Before Matching After Matching
mean treatment........ 3.0417 3.1525
mean control.......... 2.1186 2.1186
std mean diff......... 64.22 58.106
mean raw eQQ diff..... 0.9661 1.0339
med raw eQQ diff..... 1 1
max raw eQQ diff..... 4 5
mean eCDF diff........ 0.092302 0.11111
med eCDF diff........ 0.026782 0.067797
max eCDF diff........ 0.29933 0.25424
var ratio (Tr/Co)..... 2.2119 3.3899
T-test p-value........ 1.1455e-08 4.1336e-06
KS Bootstrap p-value.. < 2.22e-16 0.01
KS Naive p-value...... 0.00027523 0.044138
KS Statistic.......... 0.29933 0.25424
***** (V8) reld3 *****
Before Matching After Matching
mean treatment........ 0.27244 0.28814
mean control.......... 0.27119 0.27119
std mean diff......... 0.28019 3.7106
mean raw eQQ diff..... 0 0.016949
med raw eQQ diff..... 0 0
max raw eQQ diff..... 0 1
mean eCDF diff........ 0.00062473 0.0084746
med eCDF diff........ 0.00062473 0.0084746
max eCDF diff........ 0.0012495 0.016949
var ratio (Tr/Co)..... 0.98906 1.0378
T-test p-value........ 0.98437 0.82796
Before Matching Minimum p.value: < 2.22e-16
Variable Name(s): totchi Number(s): 7
After Matching Minimum p.value: 4.1336e-06
Variable Name(s): totchi Number(s): 7
After matching, we are unable to achieve balance for the
totchi covariate. [Why?]
We can see that the MatchBalance reports four difference
sets of balance statistics for each variable. It contains the means of
each group, standardised empirical quantile-quantile plots or so-called
empirical-QQ plots (these plots depict if two datasets come from the
same population with a common distribution). MatchBalance()
also reports a summary statistics from the raw empirical-QQ plots. It
provides variance ratio of the treatment over the control, and the t
test of difference in means. Finally, it also provides summary
statistics of the Empirical Cumulative Distribution Function (eCDF),
which is the formal name of a cumulative distribution of a sample.
Visualisation helps here. We can plot a balance plot using the
love.plot() and bal.tab() function. The
bal.tab() function calculates the standardised difference
in means. The love.plot() function generates a plot that
displays covariate balance before and after matching. Some of the
arguments are stat which determines what statistics you
want to report, one option is mean.diffs which reports the
mean differences for each covariate. We can set a threshold
at an optional value. We can also set that the statistics presented to
absolute values by adding the abs function equal
TRUE. Finally, we can also set the order of which the
statistics are presented by including the var.order
argument. If we set var.order equal to “unadjusted” the
statistics will be ordered by the balance statistic of the unadjusted
sample. See the example below. Always remember to add the object of the
Match() function as the first argument of the
bal.tab() function.
love.plot(bal.tab(match.ps, treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
data = daughters),stat = "mean.diffs",
threshold = .1, var.order = "unadjusted" ,abs=T)
Ideally, all covariates would be on the 0 line. The further away the absolute mean differences are, the worse the balance. Now we have seen what happens with the means of the covariates. Let’s see next what happens with their distributions. Remember, it can well be that the means are the same but the distributions are statistically different. We’ll test that like we did last week with a Kolomogorov-Smirnov Statistic.
Exercise 17: Check for balance using the
MatchBalance() for the Nearest Neighbour matching without
replacement, but rather than reporting mean differences, check for
balance in their cumulative distribution by setting
stat = "ks.statistics".
Reveal Answer
love.plot(bal.tab(match.ps, treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
data = daughters),stat = "ks.statistics",
threshold = .1, var.order = "unadjusted" ,abs=T)
Again, ideally everything would be on 0 (that would mean the
covariate distributions are identical between the treatment and the
control). Note also that kolomogorov-smirnov statistics only provide
information for continuous covariates.
The balance here is really bad - for some variables even worse than before matching! This doesn’t seem like the model we want to base our analysis on!
Now, let’s use propensity score with replacement. Remember, this means that a single control unit can be matched to several treated units. Matching without replacement can yield very bad matches if the number of comparison observations comparable to the treated observations is small.
How many neighbours should you use? It’s usually better to oversample – use more neighbours. This, however, involves a trade-off between bias and variance. Generally, in choosing between different matching estimators, the one with the best balance is the most satisfactory one.
To do matching with replacement we only need to change set the
argument replace of the Match() function equal
to TRUE. Let run Matching with replacement.
Exercise 18: Conduct Matching using the Match()
function, but set the replace argument equal to
TRUE. Use the summary() to report your
results. Interpret the output from the Match() function.
With matching with replacement, do you get closer to the multivariate
regression benchmark? Do you think that matching with replacement leads
to more or less bias to matching with replacement?. Does it leads to
less or variance compared to without matching replacement? Remember to
set the treatment status (Tr) function equal to
daughters$treat. Set X equal to the propensity
score obtained from the predicted values from the logistic regression
ps$fitted.
Reveal Answer
match.ps2 <- Match(Y=daughters$nowtot, Tr=daughters$treat, X=ps$fitted,
replace=T)
summary(match.ps2)
Estimate... -3.9583
AI SE...... 9.5929
T-stat..... -0.41263
p.val...... 0.67988
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 312
Matched number of observations (unweighted). 316
We found that the estimate is now -3.93. Very far from
the multivariate estimate. We even got a negative coefficient!
Regarding the output from the Match() function, we now
also have AI SE which is the matching corrected
standard error Abadie and Imbens,
2022. The T-stat and p.val are the standard t-test and
p-value of the point estimate and they are interpreted as usual.
The variance-bias trade off: Matching with replacement has the advantage of always matching each treated unit to the closest untreated unit, therefore reducing bias as compared to matching without replacement. On the other hand, variance is increased for matching with replacement as the estimates might be highly contingent on a small number of observations, whereas matching without replacement exhibits lower variance.
Exercise 19: Let’s check if we actually achieved balance
using the MatchBalance() function on a set of covariates.
One thing that we need to do is subset our data so we can conduct our
balance test only a data frame that contains only covariates. Let’s drop
the outcome variables and some covariates from the analysis. Let’s also
remove columns 1, 9, 10, 11, 12, 14 which are either outcomes or are
variables that have been created as a function of other variables. You
can use the squared brackets to remove these columns. You can use the
following syntax: data.frame[ ,-c(1,9:12, 14)]. Then, use the
bal.tab() function to report the covariate balance. The
syntax is the following:
bal.tab(object output Match function, treatment assignment variable ~ covariates, data = dataframe (excluding columns 1,9,10,11, 12 and 14)).
Remember to install/load the cobalt library to use
bal.tab() function. Identify in which variables there is
imbalance before and after matching.
Reveal Answer
balancepost=MatchBalance(treat~., match.out=match.ps2, data=daughters[,-c(1, 9:12, 14)])
We observe that before matching there is unbalance for the following
variable(s): totchi which is the total number of children.
After matching we got imbalance for the following covariates:
srvlng, age, and totchi. That’s
bad!
Let’s see the results using the bal.tab() function
bal.tab(match.ps2, treat~., data=daughters[,-c(1, 9:12, 14)])
The bal.tab() reports substantially less information. It
reports the standardised difference between the average of the treated
and untreated units after matching. It also provides the numbers of
observations in the treatment and control group, and the effective
sample size, which is the number of observations matched in both
groups.
Now plot a balance plot for Matching with replacement.
Exercise 20: Generate a balance plot before and after
matching using the love.plot() and bal.tab()
functions . Include the following arguments into the function: Set the
stat argument equal to “variance.ratios”. As you may
imagine, balance ratios are essentially the ratio of the variance of a
covariate for the treated group over the variance of the same covariate
for the control group. Also, let’s set the order of the variables by the
size of the variance ratio for the adjusted measures by setting the
var.order argument equal to “adjusted”. Also set the
abs argument equal to TRUE.
Reveal Answer
daughters_cov=daughters[,-c(1, 9:12, 14)]
love.plot(bal.tab(match.ps2, treat~., data=daughters[,-c(1, 9:12, 14)]), stat = "variance.ratios", var.order = "adjusted", abs = TRUE)
We don’t see to much improvement in terms balance for variance - again, balance is much worse for some variables after matching.
So far the closest we’ve got to the multivariate regression benchmark has been using matching without replacement. Let’s try Caliper Matching now. Caliper Matching is a variant of Nearest Neighbour matching that attempts to avoid ‘bad’ matches (\(\bar{P}_j\) that are far from our \(\bar{P}_i\). In order to do this, Caliper Matching imposes a tolerance on the maximum distance between treated and untreated units - in other words, a limit as to how bad of a match we are willing to accept. If, for some participants, no matches are found within the specified caliper (distance) then they will be excluded from the analysis. Theoretically, Caliper should be better than pure NN, since it corrects for ‘bad’ matches.
Using a caliper should improve the quality of matching and also enforces common support because untreated units far from the treated units are discarded.
Let’s estimate Caliper with a distance of 0.1 standard deviations.
Exercise 21: Perform matching setting a caliper of 0.1. To do
this, add the following argument to the Match() function:
caliper = 0.1. Remember to place a comma between the
arguments. Set the argument replacement equal to
FALSE (or F). Use the summary()
function to report your results.
Reveal Answer
match.ps.cal0.1=Match(Y=daughters$nowtot, Tr=daughters$treat,
X=ps$fitted, caliper=0.1, replace=F)
summary(match.ps.cal0.1)
Estimate... 4.7273
SE......... 6.9549
T-stat..... 0.6797
p.val...... 0.49669
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 55
Matched number of observations (unweighted). 55
Caliper (SDs)........................................ 0.1
Number of obs dropped by 'exact' or 'caliper' 257
We got4.73 That’s `better than before (assuming that
there is, in fact, a positive effect of having daughters), evn though
not significant. At least we got a positive coefficient. Note that the
number of matched observations has decreased substantively. How’s the
balance though?
Exercise 22: Perform a balance check using the
MatchBalance() function. Include the following variables:
“repub”, “srvlng”, “female”, “white”, “age” , “totchi”,
“demvote”,“reld3”. Identify which variables are imbalanced before and
after matching. Remember to store the output from the
Match() function into a new object. Then set the argument
match.out equal to new output.
Reveal Answer
bal.ps.cal0.1=MatchBalance(treat~ repub + srvlng + female + white + age +
demvote + reld3 + totchi,
match.out=match.ps.cal0.1,data=daughters)
We find that using a caliper of 0.1 - and avoiding bad matches - we
are finally able to balance the covariate totchi!
Exercise 23: Again, plot the results of your balance
covariate evaluation as you did before. Use the
love.plot(bal.tab(....)) functions to generate this plot.
Report the difference in means by setting
stat = "mean.diffs". Include a threshold by adding the
threshold argument equal to .1. Order the unadjusted
difference in means in a decreasing order by adding the argument
var.order = "unadjusted". Finally, include the
abs = TRUE argument, so you report only absolute values of
the difference in means. This is the syntax putting everything together:
love.plot(bal.tab(output matching, treat variable + covariates, data
= data, stat= “means.diffs”, threshold = .1, var.order = “unadjusted,
abs = T).
Reveal Answer
love.plot(bal.tab(match.ps.cal0.1, treat ~ repub + srvlng + female + white + age + demvote + reld3 + totchi,
data = daughters),stat = "mean.diffs",
threshold = .1, var.order = "unadjusted" ,abs=T)
This looks pretty decent! Even if not perfect, this is something one could work with - note the improvement as compared to previous balance plots! Finally, we achieve balance in most of the covariates!
Exercise 24: Try again by changing the caliper equal to 0.25.
Perform matching using the Match() function. Do this
matching without replacement, so set the replace argument
equal to FALSE (F).
Reveal Answer
match.ps.cal25 <- Match(Y = daughters$nowtot, Tr = daughters$treat, X = ps$fitted,
caliper = 0.25, replace = F)
summary(match.ps.cal25)
Estimate... 4.6364
SE......... 6.8669
T-stat..... 0.67517
p.val...... 0.49957
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 55
Matched number of observations (unweighted). 55
Caliper (SDs)........................................ 0.25
Number of obs dropped by 'exact' or 'caliper' 257
Again, we find a point estimate way larger than our regression
benchmark The estimated coefficient is now 4.63. It doesn’t
change substantially once we used a caliper equal to 0.10.
It seems that most observations are tightly clustered around a single
propensity score. If you see from the previous matching with caliper
0.1, 55 units were matched, and with a 0.25
caliper it is the same number! So increasing the caliper bigger made no
difference. Note the rather small number of observations that can be
matched even with a caliper of 0.25. The estimate of our
coefficient seems about right (under the assumption that there is an
effect), but we would still reject the null hypothesis given the
p-value. We’d conclude that we’d need more observations
here!
Note also that our findings from caliper matching indicate that NN matching included quite a lot of matches of observations that were rather distinct in terms of their proposensity scores: While they were the nearest available neighbours, they were not actually good neighbours. Being aware of the variance-bias trade-off, it seems that NN matching (without replacement) did decrease the variance by introducing a meaningful bias.
Exercise 25: Let’s examine again how well the matching
procedure did in producing balance. Check for balance using the
MatchBalance() function. Remember to store the output of
the Match() function into an object and replace this object
equal to the match.out argument.
Reveal Answer
# let's check for balance before and after matching using a 0.25 caliper.
bal.ps.cal25 = MatchBalance(treat ~ repub + srvlng + female + white + age +
demvote + reld3 + totchi,
match.out = match.ps.cal25, data = daughters)
***** (V1) repub *****
Before Matching After Matching
mean treatment........ 0.53846 0.54545
mean control.......... 0.49153 0.52727
std mean diff......... 9.4 3.6181
mean raw eQQ diff..... 0.050847 0.018182
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.023468 0.0090909
med eCDF diff........ 0.023468 0.0090909
max eCDF diff........ 0.046936 0.018182
var ratio (Tr/Co)..... 0.98066 0.99469
T-test p-value........ 0.51323 0.84218
***** (V2) srvlng *****
Before Matching After Matching
mean treatment........ 8.5865 9.7273
mean control.......... 10.78 9.3455
std mean diff......... -29.042 4.9089
mean raw eQQ diff..... 2.1525 1.0364
med raw eQQ diff..... 2 0
max raw eQQ diff..... 8 4
mean eCDF diff........ 0.061969 0.034091
med eCDF diff........ 0.060354 0.027273
max eCDF diff........ 0.13521 0.090909
var ratio (Tr/Co)..... 0.54418 0.82865
T-test p-value........ 0.12162 0.78024
KS Bootstrap p-value.. 0.138 0.846
KS Naive p-value...... 0.32452 0.97691
KS Statistic.......... 0.13521 0.090909
***** (V3) female *****
Before Matching After Matching
mean treatment........ 0.125 0.054545
mean control.......... 0.084746 0.090909
std mean diff......... 12.152 -15.867
mean raw eQQ diff..... 0.050847 0.036364
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.020127 0.018182
med eCDF diff........ 0.020127 0.018182
max eCDF diff........ 0.040254 0.036364
var ratio (Tr/Co)..... 1.3907 0.624
T-test p-value........ 0.32993 0.41498
***** (V4) white *****
Before Matching After Matching
mean treatment........ 0.87179 0.87273
mean control.......... 0.84746 0.87273
std mean diff......... 7.268 0
mean raw eQQ diff..... 0.016949 0
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 0
mean eCDF diff........ 0.012169 0
med eCDF diff........ 0.012169 0
max eCDF diff........ 0.024337 0
var ratio (Tr/Co)..... 0.85267 1
T-test p-value........ 0.63372 1
***** (V5) age *****
Before Matching After Matching
mean treatment........ 52.628 52
mean control.......... 52.203 51.164
std mean diff......... 4.7262 9.3791
mean raw eQQ diff..... 1.4915 1.4545
med raw eQQ diff..... 1 1
max raw eQQ diff..... 7 6
mean eCDF diff........ 0.033654 0.032536
med eCDF diff........ 0.029444 0.018182
max eCDF diff........ 0.087082 0.090909
var ratio (Tr/Co)..... 0.67918 0.79821
T-test p-value........ 0.77901 0.65522
KS Bootstrap p-value.. 0.66 0.946
KS Naive p-value...... 0.84606 0.97691
KS Statistic.......... 0.087082 0.090909
***** (V6) demvote *****
Before Matching After Matching
mean treatment........ 0.49929 0.48182
mean control.......... 0.51017 0.49945
std mean diff......... -8.5332 -13.764
mean raw eQQ diff..... 0.016271 0.034364
med raw eQQ diff..... 0.01 0.02
max raw eQQ diff..... 0.08 0.12
mean eCDF diff........ 0.02367 0.058442
med eCDF diff........ 0.018307 0.054545
max eCDF diff........ 0.071056 0.14545
var ratio (Tr/Co)..... 0.9728 1.1032
T-test p-value........ 0.55409 0.42803
KS Bootstrap p-value.. 0.882 0.466
KS Naive p-value...... 0.96361 0.60572
KS Statistic.......... 0.071056 0.14545
***** (V7) reld3 *****
Before Matching After Matching
mean treatment........ 0.27244 0.25455
mean control.......... 0.27119 0.27273
std mean diff......... 0.28019 -4.1358
mean raw eQQ diff..... 0 0.018182
med raw eQQ diff..... 0 0
max raw eQQ diff..... 0 1
mean eCDF diff........ 0.00062473 0.0090909
med eCDF diff........ 0.00062473 0.0090909
max eCDF diff........ 0.0012495 0.018182
var ratio (Tr/Co)..... 0.98906 0.95667
T-test p-value........ 0.98437 0.82801
***** (V8) totchi *****
Before Matching After Matching
mean treatment........ 3.0417 2.2909
mean control.......... 2.1186 2.1818
std mean diff......... 64.22 10.406
mean raw eQQ diff..... 0.9661 0.14545
med raw eQQ diff..... 1 0
max raw eQQ diff..... 4 1
mean eCDF diff........ 0.092302 0.020779
med eCDF diff........ 0.026782 0.018182
max eCDF diff........ 0.29933 0.072727
var ratio (Tr/Co)..... 2.2119 1.1826
T-test p-value........ 1.1455e-08 0.27295
KS Bootstrap p-value.. < 2.22e-16 0.716
KS Naive p-value...... 0.00027523 0.99864
KS Statistic.......... 0.29933 0.072727
Before Matching Minimum p.value: < 2.22e-16
Variable Name(s): totchi Number(s): 8
After Matching Minimum p.value: 0.27295
Variable Name(s): totchi Number(s): 8
We find that balance for all covariates!
Below we have a table that summarises the results from all the matching techniques and the multivariate regression. As we can see, Capiler matching did relatively a good job compared to the other matching techniques, if we assume there is a true significant effect about the size of the OLS estimator. Pay attention to the random order of observations and the variance-bias trade off when doing matching. Often, the output of different estimators will give you an idea of what is ‘wrong’ with your data if you get vastly divergent results.
| Matching technique | Estimation |
|---|---|
| Multivariate | 4.09 |
| NN without rep | -0.17 |
| NN with rep | -3.97 |
| Caliper (0.1) | 4.73 |
| Caliper (0.25) | 4.63 |